95 research outputs found

    Log Mining Using Generalized Association Rules

    Get PDF
    Explosive growth in size and usage of the World Wide Web has made it necessary for Web site administrators to track and analyze the navigation patterns of Web site visitors. To achieve this goal, the use of web mining tool is necessary. Web mining can be defined as the use of data mining techniques to automatically discover and extract information from web documents. Since Data Mining is primarily concerned with the discovery of knowledge and aims to provide answers to questions that people do not know how to ask, it is not an automatic process. Rather one has to exhaustively explores very large volumes of data to determine otherwise hidden relationships. The process extracts high quality information that can be used to draw conclusions based on relationships or patterns within the data. However, data mining technique are not easily applicable to Web data due to problems both related with the technology underlying the Web and the lack of standards in the design and implementation of Web pages. Information collected by the Web servers are kept in the server log is the main source of data for analyzing user navigation patterns. Once logs have been pre-processed and sessions have been obtained, there are several kinds of access pattern mining that can be performed depending on the needs of the analyst. Since the method use in this study relied on relatively simple techniques therefore the information gathered is adequate for real user profile data due to the noise in the data has to be first tackled. In this study, Data Mining techniques known as generalized association rules was used in order to get some insights into website usage pattern. For the purpose of this study, server logs from tutor.com portal were retrieved, pre-processed and analyzed. An important finding from this study is that Mathematics subject generally popular from UPSR, PMR and UPSR levels. On the contrary, arts subjects are not popular to Tutor.com users. The system administrator may consider evaluating the content and the link for such subjects, so that the real problem can be identified

    Two-class classification: comparative experiments for chronic kidney disease

    Get PDF
    Over two million of population across worldwide is currently depending on dialysis treatment or a kidney transplant to survive from kidney disease. Therefore, it is imperative for health agencies such as hospitals or insurance companies to predict the probabilities of patients who suffers from chronic case of kidney diseases, hence requiring medical attentions. This study performs a comparative experiment on prediction of chronic kidney disease via a classification methodology. Two supervised classification algorithms are used to build the classification model, which are Two-Class Decision Forest and Two-Class Neural Networks. Experimental results showed that Neural Network performed better based on all features but Decision Forest produced optimal performance with high accuracy, and precision as compared to Neural Networks and other algorithms from the literature such as K-Nearest Neighbor, Support Vector Machine, and Rule Induction

    Elderly care monitoring system with IoT application

    Get PDF
    Falls among elderly can pose serious consequences such as injury or even fatal ones. Therefore, it is essential that fall are detected early and away to that is by using IoT platform. The authors have been developing a wearable device for elderly monitoring system utilizing accelerometer. The data from accelerometer is connected to an Internet-of-Things (IoT) platform called ThingSpeakTM. Based on IoT platform, elderly patients can be remotely monitored as long as the care providers have good internet access. The paper presents the experimental results of determining the sensitivity and specificity of the accelerometer used in the proposed system. This is the first step for developing an accurate data acquisition for monitoring purposes. Based on the experimental results, the average percentage for sensitivity obtained for this device is 73.3%, while the average for specificity obtained is 89.3%. Both sensitivity and specificity tests shows promising results which indicates that the device only has a fail rate of 26.7% and error rate of 10.7%

    Data pre-processing on web server logs for generalized association rules mining algorithm

    Get PDF
    Web log file analysis began as a way for IT administrators to ensure adequate bandwidth and server capacity on their organizations website. Log file data can offer valuable insight into web site usage.It reflects actual usage in natural working condition, compared to the artificial setting of a usability lab.It represents the activity of many users, over potentially long period of time, compared to a limited number of users for an hour or two each.This paper describes the pre-processing techniques on IIS Web Server Logs ranging from the raw log file until before mining process can be performed. Since the pre-processing is tedious process, it depending on the algorithm and purposes of the applications

    Comparing the knowledge quality in rough classifier and decision tree classifier

    Get PDF
    This paper presents a comparative study of two rule based classifier; rough set (Rc) and decision tree (DTc).Both techniques apply different approach to perform classification but produce same structure of output with comparable result. Theoretically, different classifiers will generate different sets of rules via knowledge even though they are implemented to the same classification problem.Hence, the aim of this paper is to investigate the quality of knowledge produced by Rc and DTc when similar problems are presented to them.In this case, four important performance metrics are used as comparison, the accuracy of classification, rules quantity, rules length and rules coverage.Five dataset from UCI Machine Learning are chosen and then mined using Rc toolkit namely ROSETTA while C4.5 algorithm in WEKA application is chosen as DTc rule generator. The experimental result shows that Rc and DTc own capability to generate quality knowledge since most of the results are comparable. Rc outperform as an accurate classifier, produce shorter and simpler rule with higher coverage. Meanwhile, DTc obviously generates fewer numbers of rules with significant difference

    Pattern extraction for programming performance evaluation using directed apriori

    Get PDF
    Computer programming is taught as a core subject in Information Technology related studies.It is one of the most essential skills which each student has to acquire.However, there is still a small number of students who are unable to write a program well. Several researches indicated that there are many factors which can affect student programming performance.Thus, the objective of this paper is to investigate the significant factors that may influence students programming performance using information from previous student performance.Since data mining data analysis able to discover hidden knowledge in database, a programming dataset which comprises information about performance profile of Bachelor of Information Technology students of Faculty of IT, Universiti Utara Malaysia in the year 2004-2005 were explored using data mining technique.The dataset consists of 421 records with 70 mixture type of attributes were pre-processed and then mined using directed association rule (AR) mining algorithm namely apriori.The result indicated that the student who has a programming experience in advanced before starts learn programming in university and scored well in Mathematics and English subject during SPM were among the factor that contributes to a good programming grades

    Discovering usage patterns from web server logs

    Get PDF
    As the amount of information available on the World Wide Web (WWW) increases rapidly, the number of sites that hold particular information also increases. In order to have some insights o the site usage, system administrator needs tools that can aid in his usage site’s analysis.To achieve this goal, the use of web mining too is necessary to discover the usage pattern of a particular site. For the purpose of this study, server logs from the educational portal were retrieved, pre-processed and analyzed. Information collected by the Web servers are kept in the server logs and used as the main source of data for analyzing users’ navigation patterns. Once the server logs have been preprocessed and sessions have been obtained, there are several kinds of access pattern mining that can be performed, depending on the needs of the analyst. In this study, data mining technique known as Generalized Association Rule was used in order to get some insights into website usage pattern. The findings from this study provide an overview of the usage pattern of particular educational portal. The study also demonstrates how Generalized Association Rule can be used in site usage analysis. Such a technique enables the discovery of hidden information within the web server logs using data mining technique

    The preferable test documentation using IEEE 829

    Get PDF
    During software development, testing is one of the processes to find errors and aimed at evaluating a program meets its required results. In testing phase there are several testing activity involve user acceptance test, test procedure and others. If there is no documentation involve in testing the phase the difficulty happen during test with no solution. It because no reference they can refer to overcome the same problem. IEEE 829 is one of the standard to conformance the address requirements. In this standard has several documentation provided during testing including during preparing test, running the test and completion test. In this paper we used this standard as guideline to analyze which documentation our companies prefer the most. From our analytical study, most company in Malaysia they prepare document for Test Plan and Test Summary
    • …
    corecore